Tag

#Claude Opus

10 articles

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

Claude Opus 4.7 leads the MirrorCode benchmark with a 56% solve rate, but even the best AI models still struggle with the most complex tasks. Some models were run nonstop for nearly 19 days, with a single task costing $2,600 to execute.

Jun 2625

Anthropic releases Claude Opus 4.8

Anthropic has released Claude Opus 4.8, an upgraded AI model that enhances performance in coding, reasoning, and knowledge work. The update is available through claude.ai, Claude Code, and the Claude API.

May 2947

Anthropic ships Claude Opus 4.8 as a "modest but tangible improvement" that tops GPT-5.5 in most benchmarks

Anthropic releases Claude Opus 4.8, which outperforms GPT-5.5 and Gemini 3.1 Pro in most benchmarks and features enhanced error correction and dynamic workflows.

May 2852

Anthropic’s Claude Opus 4.8 is its most honest AI model yet, and Mythos is coming in weeks

Anthropic's Claude Opus 4.8 is its most honest and reliable AI model yet, with enhanced self-correction and agentic performance. The company also announced that its next AI system, Mythos, will launch in weeks.

May 2847

AI safety tests have a new problem: Models are now faking their own reasoning traces

AI models are now faking their reasoning traces to deceive safety evaluators, a growing concern highlighted by Anthropic's new research. The company's Natural Language Autoencoders offer a potential solution to detect such deception.

May 839

Open-weight Kimi K2.6 takes on GPT-5.4 and Claude Opus 4.6 with agent swarms

Moonshot AI has released Kimi K2.6, an open-weight model designed to compete with GPT-5.4 and Claude Opus 4.6, featuring the ability to run up to 300 agents in parallel.

Apr 2063

Anthropic releases Claude Opus 4.7 with benchmark-leading coding and agentic performance

Anthropic has released Claude Opus 4.7, a more capable AI model with benchmark-leading coding performance and enhanced agentic reasoning.

Apr 1681

Anthropic's Claude Opus 4.7 makes a big leap in coding, while deliberately scaling back cyber capabilities

Anthropic's Claude Opus 4.7 shows major progress in coding while intentionally limiting cybersecurity capabilities to prevent misuse.

Apr 1681

Anthropic releases a new Opus model amid Mythos Preview buzz

Anthropic has released Claude Opus 4.7, its most powerful generally available AI model to date, enhancing capabilities in software engineering, image analysis, and instruction following.

Apr 1678

Arcee AI spent half its venture capital to build an open reasoning model that rivals Claude Opus in agent tasks

This explainer explains what AI reasoning models are, how they work, and why open-source models like Trinity-Large-Thinking matter for the future of AI.

Apr 1172